Counting What Counts: Decompounding for Keyphrase Extraction
نویسندگان
چکیده
A core assumption of keyphrase extraction is that a concept is more important if it is mentioned more often in a document. Especially in languages like German that form large noun compounds, frequency counts might be misleading as concepts “hidden” in compounds are not counted. We hypothesize that using decompounding before counting term frequencies may lead to better keyphrase extraction. We identified two effects of decompounding: (i) enhanced frequency counts, and (ii) more keyphrase candidates. We created two German evaluation datasets to test our hypothesis and analyzed the effect of additional decompounding for keyphrase extraction.
منابع مشابه
Identifying Descriptive Keyphrases from Scholarly Big Data
The large and growing amounts of online textual data present both challenges and opportunities to enhance knowledge discovery. One important challenge is to automatically extract a small set of keyphrases from a document that can accurately describe the document’s content and can facilitate fast information processing. In this paper, we explore artificial intelligence approaches to keyphrase ex...
متن کاملUnsupervised Keyphrase Extraction for Search Ontologies
Ontology learning today ranges from simple frequency counting methods to advanced linguistic analyses of sentence structure and word semantics. For ontologies in information retrieval systems, class concepts and hierarchical relationships at the appropriate level of detail are crucial to the quality of retrieval. In this paper, we present an unsupervised keyphrase extraction system and evaluate...
متن کاملState of the Art of Automatic Keyphrase Extraction Methods (État de l'art des méthodes d'extraction automatique de termes-clés) [in French]
State of the Art of Automatic Keyphrase Extraction Methods This article presents the state of the art of the automatic keyphrase extraction methods. The aim of the automatic keyphrase extraction task is to extract the most representative terms of a document. Automatic keyphrase extraction methods can be divided into two categories : supervised methods and unsupervised methods. For supervised me...
متن کاملApproximate Matching for Evaluating Keyphrase Extraction
We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analysis of keyphrase extraction that subsumes most existing approaches, which allows for fair testing ...
متن کاملWikiRank: Improving Keyphrase Extraction Based on Background Knowledge
Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the docum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015